{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# \ud83d\udcdd Exercise M3.02\n",
    "\n",
    "The goal is to find the best set of hyperparameters which maximize the\n",
    "generalization performance on a training set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import fetch_california_housing\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "data, target = fetch_california_housing(return_X_y=True, as_frame=True)\n",
    "target *= 100  # rescale the target in k$\n",
    "\n",
    "data_train, data_test, target_train, target_test = train_test_split(\n",
    "    data, target, random_state=42\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this exercise, we progressively define the regression pipeline and later\n",
    "tune its hyperparameters.\n",
    "\n",
    "Start by defining a pipeline that:\n",
    "* uses a `StandardScaler` to normalize the numerical data;\n",
    "* uses a `sklearn.neighbors.KNeighborsRegressor` as a predictive model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `RandomizedSearchCV` with `n_iter=20` and\n",
    "`scoring=\"neg_mean_absolute_error\"` to tune the following hyperparameters\n",
    "of the `model`:\n",
    "\n",
    "- the parameter `n_neighbors` of the `KNeighborsRegressor` with values\n",
    "  `np.logspace(0, 3, num=10).astype(np.int32)`;\n",
    "- the parameter `with_mean` of the `StandardScaler` with possible values\n",
    "  `True` or `False`;\n",
    "- the parameter `with_std` of the `StandardScaler` with possible values `True`\n",
    "  or `False`.\n",
    "\n",
    "The `scoring` function is expected to return higher values for better models,\n",
    "since grid/random search objects **maximize** it. Because of that, error\n",
    "metrics like `mean_absolute_error` must be negated (using the `neg_` prefix)\n",
    "to work correctly (remember lower errors represent better models).\n",
    "\n",
    "Notice that in the notebook \"Hyperparameter tuning by randomized-search\" we\n",
    "pass distributions to be sampled by the `RandomizedSearchCV`. In this case we\n",
    "define a fixed grid of hyperparameters to be explored. Using a `GridSearchCV`\n",
    "instead would explore all the possible combinations on the grid, which can be\n",
    "costly to compute for large grids, whereas the parameter `n_iter` of the\n",
    "`RandomizedSearchCV` controls the number of different random combination that\n",
    "are evaluated. Notice that setting `n_iter` larger than the number of possible\n",
    "combinations in a grid (in this case 10 x 2 x 2 = 40) would lead to repeating\n",
    "already-explored combinations.\n",
    "\n",
    "Once the computation has completed, print the best combination of parameters\n",
    "stored in the `best_params_` attribute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here."
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "main_language": "python"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}